Convex Batch Mode Active Sampling via alpha-relative Pearson Divergence
نویسندگان
چکیده
Active learning is a machine learning technique that trains a classifier after selecting a subset from an unlabeled dataset for labeling and using the selected data for training. Recently, batch mode active learning, which selects a batch of samples to label in parallel, has attracted a lot of attention. Its challenge lies in the choice of criteria used for guiding the search of the optimal batch. In this paper, we propose a novel approach to selecting the optimal batch of queries by minimizing the α-relative Pearson divergence (RPE) between the labeled and the original datasets. This particular divergence is chosen since it can distinguish the optimal batch more easily than other measures especially when available candidates are similar. The proposed objective is a min-max optimization problem, and it is difficult to solve due to the involvement of both minimization and maximization. We find that the objective has an equivalent convex form, and thus a global optimal solution can be obtained. Then the subgradient method can be applied to solve the simplified convex problem. Our empirical studies on UCI datasets demonstrate the effectiveness of the proposed approach compared with the state-of-the-art batch mode active learning methods. Introduction Active learning is proposed to alleviate the effort of the labeling process by selecting informative data samples. It is useful when unlabeled data are abundant but manual labeling is expensive. The challenge of active learning is that, given a large pool of unlabeled data and a relatively small labeling budget, the classifier trained on selected labeled data must have good generalization performance on unseen data. In other words, an active learning algorithm selects only a few data instances for labeling while maintaining certain classification performance. Traditional active learning approaches that select the single most informative data example usually retrain the classifier when a new instance is labeled. Under circumstances where multiple annotators are working concurrently, batch mode active learning which iteratively selects a batch of queries to label is more efficient and appropriate. In the ∗corresponding author Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. batch mode active learning process, the learner is given a labeled set and an unlabeled set, and iteratively chooses a batch of instances from the unlabeled set to query for the labels. The main difficulty of batch active learning is under what criterion the batch is selected. One of the most recent work in batch mode active learning uses representative information based on distribution matching (Chattopadhyay et al. 2013). It adopts maximum mean discrepancy (MMD) (Gretton et al. 2006) to select the batch that minimizes the empirical MMD score between labeled and unlabeled data. It turns out that the use of MMD to capture representative information cannot effectively distinguish between the optimal batch and the other candidates (Settles 2010; Wang and Ye 2013). When more data are labeled, the induced candidate batches become more similar, which makes the problem of MMD seems more serious. Therefore, to handle this issue, we propose to use an alternative measure called α-relative Pearson divergence (RPE) (Yamada et al. 2011) which is more appropriate to compare distributions because of the following distinct properties. First, divergence, such as K-L divergence (Kullback and Leibler 1951), is well-known to be suitable for distribution comparison. Second, the superiority of RPE against MMD has been shown in two-sample distribution matching. Therefore, the effectiveness of RPE in distribution-oriented batch active learning could also be expected. In addition, it is also shown that RPE often gives larger dissimilarity score when two distributions are similar but non-identical (Yamada et al. 2011). It is with the two nice properties that RPE is usually able to distinguish between the optimal batch and the other candidates. Such advantages are further enlarged when the distributions represented by different candidates become more similar. Our main contributions can be summarized as follows. 1. We propose a novel batch mode active learning algorithm based on α-relative Pearson divergence (RPE) whose properties are suitable for this task. 2. When using RPE in batch active learning, some auxiliary variable will be introduced. As a result, the overall objective function becomes a min-max optimization problem, and generally it is hard to solve because the objective function is simultaneously maximized and minimized with respect to the first and second variable. To address Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
منابع مشابه
Active Instance Sampling via Matrix Partition
Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be f...
متن کاملDiscriminative Batch Mode Active Learning
Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at one time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative un...
متن کاملDynamic Batch Mode Active Learning via L1 Regularization
We propose a method for dynamic batch mode active learning where the batch size and selection criteria are integrated into a single formulation.
متن کاملDivergence Function, Duality, and Convex Analysis
From a smooth, strictly convex function phi: Rn --> R, a parametric family of divergence function Dphi(alpha) may be introduced: [ equation: see text] for x, y epsilon int dom (Phi) subset Rn, and for alpha in R, with Dphi(+/-1) defined through taking the limit of alpha. Each member is shown to induce an alpha-independent Riemannian metric, as well as a pair of dual alpha-connections, which are...
متن کاملA Batch-mode Active Learning Method Based on the Nearest Average-class Distance (NACD) for Multiclass Brain-Computer Interfaces ?
In this paper, a novel batch-mode active learning method based on the nearest average-class distance (ALNACD) is proposed to solve multi-class problems with Linear Discriminate Analysis (LDA) classifiers. Using the Nearest Average-class Distance (NACD) query function, the ALNACD algorithm selects a batch of most uncertain samples from unlabeled data to improve gradually pre-trained classifiers’...
متن کامل